{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Graphical Causal Models\n",
    "\n",
    "\n",
    "## Thinking About Causality\n",
    "\n",
    "Have you ever noticed how those cooks on YouTube videos are excellent at describing food? \"Reduce the sauce until it reaches a velvety consistency\". If you are just starting to learn how to cook, you have no idea what this even means. Just give me the time I should leave this thing on the stove! With causality, it's the same thing. If you walk into a bar and hear folks discussing causality (probably a bar next to an economics department), you will hear them say how the confounding of income made it challenging to identify the imigration effect on that neighborhood, so they had to use an instrumental variable. And by now, you might not understand what they are talking about. But I'll fix at least some of this problem right now. \n",
    "\n",
    "Graphical models is the language of causality. Is what you use not only to talk with other brave and true causality aficionados, but it is also something you use to make your own thoughts clearer. \n",
    "\n",
    "As a starting point, let's take conditional independence of the potential outcomes, for example. This is one of the main assumptions that we require to be true when doing causal inference:\n",
    "\n",
    "$\n",
    "(Y_0, Y_1) \\perp T | X\n",
    "$\n",
    "\n",
    "Conditional Independence makes it possible for us to measure an effect on the outcome that is solely due to the treatment, and not any other variable lurking around. The classic example of this is the effect of a medicine on an ill patient. If only severely ill patients get the drug, it might even look like giving the drug decreases the patient's health. That is because the effect of the severity is getting mixed up with the effect of the drug. If, however, we break down the patients by severe and not severe cases and analyse the drug impact in each subgroup, we will get a more clear picture of what the true effect is. This breaking down the population by its features is what we call controlling for or conditioning on X. By conditioning on the severe cases, the treatment mechanism becomes as good as random. Patients within the severe group may or may not receive the drug only due to chance, not due a high severity anymore, since all patients are the same on this dimension. And if treatment is as if randomly assigned within groups, the treatment becomes conditionally independent of the potential outcomes. \n",
    "\n",
    "Independence and conditional independence are central to causal inference. Yet, it can be quite challenging to wrap our head around them. But this can change if we use the right language to describe this problem. Here is where **causal graphical models** comes in. A causal graphical model is a way to represent how causality works in terms of what causes what. \n",
    "\n",
    "A graphical model looks like this"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import graphviz as gr\n",
    "from matplotlib import style\n",
    "import seaborn as sns\n",
    "from matplotlib import pyplot as plt\n",
    "style.use(\"fivethirtyeight\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"290pt\" height=\"188pt\"\n",
       " viewBox=\"0.00 0.00 289.56 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-184 285.5624,-184 285.5624,4 -4,4\"/>\n",
       "<!-- Z -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>Z</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Z</text>\n",
       "</g>\n",
       "<!-- X -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>X</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X</text>\n",
       "</g>\n",
       "<!-- Z&#45;&gt;X -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>Z&#45;&gt;X</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M27,-143.8314C27,-136.131 27,-126.9743 27,-118.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"30.5001,-118.4132 27,-108.4133 23.5001,-118.4133 30.5001,-118.4132\"/>\n",
       "</g>\n",
       "<!-- U -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>U</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"99\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">U</text>\n",
       "</g>\n",
       "<!-- U&#45;&gt;X -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>U&#45;&gt;X</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M83.7307,-146.7307C73.803,-136.803 60.6847,-123.6847 49.5637,-112.5637\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"51.7933,-109.8436 42.2473,-105.2473 46.8436,-114.7933 51.7933,-109.8436\"/>\n",
       "</g>\n",
       "<!-- Y -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>Y</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"99\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Y</text>\n",
       "</g>\n",
       "<!-- U&#45;&gt;Y -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>U&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M99,-143.8314C99,-136.131 99,-126.9743 99,-118.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"102.5001,-118.4132 99,-108.4133 95.5001,-118.4133 102.5001,-118.4132\"/>\n",
       "</g>\n",
       "<!-- medicine -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>medicine</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"206\" cy=\"-90\" rx=\"42.5369\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"206\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">medicine</text>\n",
       "</g>\n",
       "<!-- survived -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>survived</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"241\" cy=\"-18\" rx=\"40.6248\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"241\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">survived</text>\n",
       "</g>\n",
       "<!-- medicine&#45;&gt;survived -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>medicine&#45;&gt;survived</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M214.6517,-72.2022C218.6481,-63.981 223.4801,-54.041 227.909,-44.9301\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"231.1023,-46.3664 232.3265,-35.8425 224.8068,-43.306 231.1023,-46.3664\"/>\n",
       "</g>\n",
       "<!-- severeness -->\n",
       "<g id=\"node7\" class=\"node\">\n",
       "<title>severeness</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"225\" cy=\"-162\" rx=\"47.841\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"225\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">severeness</text>\n",
       "</g>\n",
       "<!-- severeness&#45;&gt;medicine -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>severeness&#45;&gt;medicine</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M220.2055,-143.8314C218.1195,-135.9266 215.6286,-126.4872 213.3194,-117.7365\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"216.6629,-116.6894 210.7272,-107.9134 209.8946,-118.4755 216.6629,-116.6894\"/>\n",
       "</g>\n",
       "<!-- severeness&#45;&gt;survived -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>severeness&#45;&gt;survived</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M238.986,-144.6766C246.1977,-134.6158 254.2505,-121.3262 258,-108 263.8235,-87.3027 258.7526,-63.3614 252.6712,-45.377\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"255.8866,-43.9796 249.1504,-35.8019 249.3167,-46.3955 255.8866,-43.9796\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x11ddd4f28>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"Z\", \"X\")\n",
    "g.edge(\"U\", \"X\")\n",
    "g.edge(\"U\", \"Y\")\n",
    "\n",
    "g.edge(\"medicine\", \"survived\")\n",
    "g.edge(\"severeness\", \"survived\")\n",
    "g.edge(\"severeness\", \"medicine\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each node is a random variable. We use arrows, or edges, to show if a variable causes another. In the first graphical model above we are saying that Z causes X and that U causes X and Y. To give a more concrete example, we can translate our thoughts about the impact of the medicine on patient survival as the second graph above. Severeness causes both medicine and survival and medicine also causes survival. As we will see, this causal graphical models language will help us make our thinking about causality more more clear, as it makes it explicit our beliefs about how the world works. \n",
    "\n",
    "## Crash Course in Graphical Models\n",
    "\n",
    "There are [whole semesters on graphical models](https://www.coursera.org/specializations/probabilistic-graphical-models). But, for our purpose, it is just (very) important that we understand what kind of independence and conditional independence assumptions a graphical model entails. As we shall see, independence flows through a graphical model like water flows through a stream. We can stop this flow or we can enable it, depending on how we treat the variables in it. To understand this, let's examine some common graphical structures and examples. They will be quite simple, but they are the sufficient building blocks to understand everything about independence and conditional independence on graphical models.\n",
    "\n",
    "First, look at this very simple graph. A causes B which causes C. Or X causes Y which causes Z."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"298pt\" height=\"188pt\"\n",
       " viewBox=\"0.00 0.00 298.22 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-184 294.2158,-184 294.2158,4 -4,4\"/>\n",
       "<!-- A -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>A</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">A</text>\n",
       "</g>\n",
       "<!-- C -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>C</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">C</text>\n",
       "</g>\n",
       "<!-- A&#45;&gt;C -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>A&#45;&gt;C</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M27,-143.8314C27,-136.131 27,-126.9743 27,-118.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"30.5001,-118.4132 27,-108.4133 23.5001,-118.4133 30.5001,-118.4132\"/>\n",
       "</g>\n",
       "<!-- B -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>B</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">B</text>\n",
       "</g>\n",
       "<!-- C&#45;&gt;B -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>C&#45;&gt;B</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M27,-71.8314C27,-64.131 27,-54.9743 27,-46.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"30.5001,-46.4132 27,-36.4133 23.5001,-46.4133 30.5001,-46.4132\"/>\n",
       "</g>\n",
       "<!-- X -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>X</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"99\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X</text>\n",
       "</g>\n",
       "<!-- Y -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>Y</title>\n",
       "<ellipse fill=\"none\" stroke=\"#ff0000\" cx=\"99\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Y</text>\n",
       "</g>\n",
       "<!-- X&#45;&gt;Y -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>X&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M99,-143.8314C99,-136.131 99,-126.9743 99,-118.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"102.5001,-118.4132 99,-108.4133 95.5001,-118.4133 102.5001,-118.4132\"/>\n",
       "</g>\n",
       "<!-- Z -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>Z</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"99\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Z</text>\n",
       "</g>\n",
       "<!-- Y&#45;&gt;Z -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>Y&#45;&gt;Z</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M99,-71.8314C99,-64.131 99,-54.9743 99,-46.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"102.5001,-46.4132 99,-36.4133 95.5001,-46.4133 102.5001,-46.4132\"/>\n",
       "</g>\n",
       "<!-- causal knowledge -->\n",
       "<g id=\"node7\" class=\"node\">\n",
       "<title>causal knowledge</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"217\" cy=\"-162\" rx=\"73.4323\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"217\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">causal knowledge</text>\n",
       "</g>\n",
       "<!-- solve problems -->\n",
       "<g id=\"node8\" class=\"node\">\n",
       "<title>solve problems</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"217\" cy=\"-90\" rx=\"63.7949\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"217\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">solve problems</text>\n",
       "</g>\n",
       "<!-- causal knowledge&#45;&gt;solve problems -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>causal knowledge&#45;&gt;solve problems</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M217,-143.8314C217,-136.131 217,-126.9743 217,-118.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"220.5001,-118.4132 217,-108.4133 213.5001,-118.4133 220.5001,-118.4132\"/>\n",
       "</g>\n",
       "<!-- job promotion -->\n",
       "<g id=\"node9\" class=\"node\">\n",
       "<title>job promotion</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"217\" cy=\"-18\" rx=\"60.429\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"217\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">job promotion</text>\n",
       "</g>\n",
       "<!-- solve problems&#45;&gt;job promotion -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>solve problems&#45;&gt;job promotion</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M217,-71.8314C217,-64.131 217,-54.9743 217,-46.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"220.5001,-46.4132 217,-36.4133 213.5001,-46.4133 220.5001,-46.4132\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x11dedc4e0>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"A\", \"C\")\n",
    "g.edge(\"C\", \"B\")\n",
    "\n",
    "g.edge(\"X\", \"Y\")\n",
    "g.edge(\"Y\", \"Z\")\n",
    "g.node(\"Y\", \"Y\", color=\"red\")\n",
    "\n",
    "\n",
    "g.edge(\"causal knowledge\", \"solve problems\")\n",
    "g.edge(\"solve problems\", \"job promotion\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the first graph, dependence flows in the direction of the arrows. To give a more concrete example, let's say that knowing about causal inference is the only way to solve business problems and solving those problems is the only way to get a job promotion. So causal knowledge causes problem solving which causes job promotion. We can say here that job promotion is dependent on causal knowledge. The greater the causal knowledge, the greater your chances of getting a promotion. Notice that dependence is symmetric, although it is a little less intuitive. The greater your chances of promotion, the greater the chance you have causal knowledge, otherwise it would be difficult to get a promotion. \n",
    "\n",
    "Now, let's say I condition on the intermediary variable. In this case, the dependence is blocked. So, X and Z are independent given Y. By the same token, in our exemple, if I know that you are good at solving problems, knowing that you know causal inference doesn't give any further information about your chances of getting a job promotion. In mathematical terms, \\\\(E[Promotion|Solve \\ problems, Causal \\ knowledge]=E[Promotion|Solve \\ problems]\\\\). The inverse is also true, once I know how good you are at solving problems, knowing about your job promotion status gives me no further information about how likely you are to know causal inference. \n",
    "\n",
    "As a general rule, the dependence flow in the direct path from A to B is blocked when we condition on an intermediary variable C. Or,\n",
    "\n",
    "$A \\not\\!\\perp\\!\\!\\!\\perp B$\n",
    "\n",
    "and\n",
    "\n",
    "$\n",
    "A \\!\\perp\\!\\!\\!\\perp B | C\n",
    "$\n",
    "\n",
    "Now, let's consider a fork structure. In this case, the same variable causes two other variables down the graph. In this case, the dependence flows backward through the arrows and we have what it is called a **backdoor path**. We can close the backdoor path and shut down dependence by conditioning on the common cause."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"591pt\" height=\"116pt\"\n",
       " viewBox=\"0.00 0.00 591.25 116.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 112)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-112 587.2468,-112 587.2468,4 -4,4\"/>\n",
       "<!-- C -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>C</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"63\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"63\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">C</text>\n",
       "</g>\n",
       "<!-- A -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>A</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">A</text>\n",
       "</g>\n",
       "<!-- C&#45;&gt;A -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>C&#45;&gt;A</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M54.2854,-72.5708C50.0403,-64.0807 44.8464,-53.6929 40.1337,-44.2674\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"43.237,-42.6477 35.6343,-35.2687 36.976,-45.7782 43.237,-42.6477\"/>\n",
       "</g>\n",
       "<!-- B -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>B</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"99\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">B</text>\n",
       "</g>\n",
       "<!-- C&#45;&gt;B -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>C&#45;&gt;B</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M71.7146,-72.5708C75.9597,-64.0807 81.1536,-53.6929 85.8663,-44.2674\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"89.024,-45.7782 90.3657,-35.2687 82.763,-42.6477 89.024,-45.7782\"/>\n",
       "</g>\n",
       "<!-- X -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>X</title>\n",
       "<ellipse fill=\"none\" stroke=\"#ff0000\" cx=\"207\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X</text>\n",
       "</g>\n",
       "<!-- Y -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>Y</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"171\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"171\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Y</text>\n",
       "</g>\n",
       "<!-- X&#45;&gt;Y -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>X&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M198.2854,-72.5708C194.0403,-64.0807 188.8464,-53.6929 184.1337,-44.2674\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"187.237,-42.6477 179.6343,-35.2687 180.976,-45.7782 187.237,-42.6477\"/>\n",
       "</g>\n",
       "<!-- Z -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>Z</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"243\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"243\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Z</text>\n",
       "</g>\n",
       "<!-- X&#45;&gt;Z -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>X&#45;&gt;Z</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M215.7146,-72.5708C219.9597,-64.0807 225.1536,-53.6929 229.8663,-44.2674\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"233.024,-45.7782 234.3657,-35.2687 226.763,-42.6477 233.024,-45.7782\"/>\n",
       "</g>\n",
       "<!-- statistics -->\n",
       "<g id=\"node7\" class=\"node\">\n",
       "<title>statistics</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"433\" cy=\"-90\" rx=\"40.6335\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"433\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">statistics</text>\n",
       "</g>\n",
       "<!-- causal inference -->\n",
       "<g id=\"node8\" class=\"node\">\n",
       "<title>causal inference</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"355\" cy=\"-18\" rx=\"67.1093\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"355\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">causal inference</text>\n",
       "</g>\n",
       "<!-- statistics&#45;&gt;causal inference -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>statistics&#45;&gt;causal inference</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M415.3006,-73.6621C405.2941,-64.4253 392.5889,-52.6974 381.4237,-42.3911\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"383.7142,-39.7422 373.9921,-35.5312 378.9662,-44.8858 383.7142,-39.7422\"/>\n",
       "</g>\n",
       "<!-- machine learning -->\n",
       "<g id=\"node9\" class=\"node\">\n",
       "<title>machine learning</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"512\" cy=\"-18\" rx=\"71.4944\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"512\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">machine learning</text>\n",
       "</g>\n",
       "<!-- statistics&#45;&gt;machine learning -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>statistics&#45;&gt;machine learning</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M450.9263,-73.6621C461.0611,-64.4253 473.9292,-52.6974 485.2375,-42.3911\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"487.7311,-44.8541 492.7644,-35.5312 483.0158,-39.6804 487.7311,-44.8541\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x11dedc978>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"C\", \"A\")\n",
    "g.edge(\"C\", \"B\")\n",
    "\n",
    "g.edge(\"X\", \"Y\")\n",
    "g.edge(\"X\", \"Z\")\n",
    "g.node(\"X\", \"X\", color=\"red\")\n",
    "\n",
    "g.edge(\"statistics\", \"causal inference\")\n",
    "g.edge(\"statistics\", \"machine learning\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As an example, let's say your knowledge of statistics causes you to know more of causal inference and machine learning. If I don't know your level of statistical knowledge, then knowing that you are good at causal inference makes it more likely that you are also good at machine learning. That is because even if I don't know your level of statistical knowledge, I can infer it from your causal inference knowledge: if you are good at causal inference you are probably good at statistics, which also makes it more likely that you are good at machine learning. \n",
    "\n",
    "Now, if I condition on your knowledge about statistics, then how much you know about machine learning becomes independent of how much you know about causal inference. You see, knowing your level of statistics already gives me all the information I need to infer the level of your machine learning skills. Knowing your level of causal inference will give no further information in this case. \n",
    "\n",
    "As a general rule, two variables that share a common cause are dependent, but independent when we condition on the common cause. Or\n",
    "\n",
    "$A \\not\\!\\perp\\!\\!\\!\\perp B$\n",
    "\n",
    "and\n",
    "\n",
    "$\n",
    "A \\!\\perp\\!\\!\\!\\perp B | C\n",
    "$\n",
    "\n",
    "The only structure that is missing is the collider. A collider is when two arrows collide on a single variable. We can say that in this case if both variables share a common effect. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"457pt\" height=\"116pt\"\n",
       " viewBox=\"0.00 0.00 456.73 116.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 112)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-112 452.7344,-112 452.7344,4 -4,4\"/>\n",
       "<!-- B -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>B</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">B</text>\n",
       "</g>\n",
       "<!-- C -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>C</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"63\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"63\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">C</text>\n",
       "</g>\n",
       "<!-- B&#45;&gt;C -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>B&#45;&gt;C</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M35.7146,-72.5708C39.9597,-64.0807 45.1536,-53.6929 49.8663,-44.2674\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"53.024,-45.7782 54.3657,-35.2687 46.763,-42.6477 53.024,-45.7782\"/>\n",
       "</g>\n",
       "<!-- A -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>A</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"99\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">A</text>\n",
       "</g>\n",
       "<!-- A&#45;&gt;C -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>A&#45;&gt;C</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M90.2854,-72.5708C86.0403,-64.0807 80.8464,-53.6929 76.1337,-44.2674\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"79.237,-42.6477 71.6343,-35.2687 72.976,-45.7782 79.237,-42.6477\"/>\n",
       "</g>\n",
       "<!-- Y -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>Y</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"171\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"171\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Y</text>\n",
       "</g>\n",
       "<!-- X -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>X</title>\n",
       "<ellipse fill=\"none\" stroke=\"#ff0000\" cx=\"197\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"197\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X</text>\n",
       "</g>\n",
       "<!-- Y&#45;&gt;X -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>Y&#45;&gt;X</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M177.427,-72.2022C180.3866,-64.0064 183.963,-54.1024 187.2447,-45.0145\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"190.5424,-46.1874 190.647,-35.593 183.9585,-43.8098 190.5424,-46.1874\"/>\n",
       "</g>\n",
       "<!-- Z -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>Z</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"243\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"243\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Z</text>\n",
       "</g>\n",
       "<!-- Z&#45;&gt;X -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>Z&#45;&gt;X</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M232.3311,-73.3008C226.6414,-64.3952 219.5223,-53.2524 213.1684,-43.307\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"215.9445,-41.1513 207.6111,-34.6087 210.0456,-44.92 215.9445,-41.1513\"/>\n",
       "</g>\n",
       "<!-- statistics -->\n",
       "<g id=\"node7\" class=\"node\">\n",
       "<title>statistics</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"329\" cy=\"-90\" rx=\"40.6335\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"329\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">statistics</text>\n",
       "</g>\n",
       "<!-- job promotion -->\n",
       "<g id=\"node8\" class=\"node\">\n",
       "<title>job promotion</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"373\" cy=\"-18\" rx=\"60.429\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"373\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">job promotion</text>\n",
       "</g>\n",
       "<!-- statistics&#45;&gt;job promotion -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>statistics&#45;&gt;job promotion</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M339.6512,-72.5708C344.8165,-64.1184 351.1313,-53.7851 356.8709,-44.3931\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"359.9132,-46.1268 362.1412,-35.7689 353.9402,-42.4766 359.9132,-46.1268\"/>\n",
       "</g>\n",
       "<!-- flatter -->\n",
       "<g id=\"node9\" class=\"node\">\n",
       "<title>flatter</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"418\" cy=\"-90\" rx=\"30.9706\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"418\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">flatter</text>\n",
       "</g>\n",
       "<!-- flatter&#45;&gt;job promotion -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>flatter&#45;&gt;job promotion</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M407.3356,-72.937C402.0571,-64.4914 395.5637,-54.1019 389.6497,-44.6396\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"392.4838,-42.5702 384.2158,-35.9453 386.5478,-46.2803 392.4838,-42.5702\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x11dedca20>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"B\", \"C\")\n",
    "g.edge(\"A\", \"C\")\n",
    "\n",
    "g.edge(\"Y\", \"X\")\n",
    "g.edge(\"Z\", \"X\")\n",
    "g.node(\"X\", \"X\", color=\"red\")\n",
    "\n",
    "g.edge(\"statistics\", \"job promotion\")\n",
    "g.edge(\"flatter\", \"job promotion\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As an example, consider that there are two ways to get a job promotion. You can either be good at statistics or flatter your boss. If I don't condition on your job promotion, that is, I know nothing if you will or won't get it, then your level of statistics and flattering are independent. In other words, knowing how good you are at statistics tells me nothing about how good you are at flattering your boss. On the other hand, if you did get a job promotion, suddenly, knowing your level of statistics tells me about your level of flattering. If you are bad at statistics and you did get a promotion, it becomes more likely that you know how to flatter, otherwise you wouldn't get a promotion. Conversely, if you are bad at flattering, it must be the case that you are good at statistics. This phenomenon is sometimes called **explaining away**, because one cause already explains the effect, making the other cause less likely.\n",
    "\n",
    "As a general rule, conditioning on a collider opens the dependence path. Not conditioning on it leaves it closed. Or\n",
    "\n",
    "$A \\!\\perp\\!\\!\\!\\perp B$\n",
    "\n",
    "and\n",
    "\n",
    "$\n",
    "A \\not\\!\\perp\\!\\!\\!\\perp B | C\n",
    "$\n",
    "\n",
    "Knowing the three structures, we can derive an even more general rule. A path is blocked if and only if:\n",
    "1. It contains a non collider that has been conditioned on\n",
    "2. It contains a collider that has not been conditioned on and has no descendants that have been conditioned on.\n",
    "\n",
    "Here is a cheat sheet about how dependence flows in a graph. I've taken from a [Stanford presentation](http://ai.stanford.edu/~paskin/gm-short-course/lec2.pdf) by Mark Paskin.\n",
    "![img](data/img/graph-flow.png)\n",
    "\n",
    "As a final example, try to figure out some independence and dependence relationship in the following causal graph.\n",
    "1. Is \\\\(D \\!\\perp\\!\\!\\!\\perp C\\\\)?\n",
    "2. Is \\\\(D \\!\\perp\\!\\!\\!\\perp C| A \\\\) ?\n",
    "3. Is \\\\(D \\!\\perp\\!\\!\\!\\perp C| G \\\\) ?\n",
    "4. Is \\\\(A \\!\\perp\\!\\!\\!\\perp F \\\\) ?\n",
    "5. Is \\\\(A \\!\\perp\\!\\!\\!\\perp F|E \\\\) ?\n",
    "6. Is \\\\(A \\!\\perp\\!\\!\\!\\perp F|E,C \\\\) ?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"206pt\" height=\"188pt\"\n",
       " viewBox=\"0.00 0.00 206.00 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-184 202,-184 202,4 -4,4\"/>\n",
       "<!-- C -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>C</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"99\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">C</text>\n",
       "</g>\n",
       "<!-- A -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>A</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">A</text>\n",
       "</g>\n",
       "<!-- C&#45;&gt;A -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>C&#45;&gt;A</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M83.7307,-146.7307C73.803,-136.803 60.6847,-123.6847 49.5637,-112.5637\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"51.7933,-109.8436 42.2473,-105.2473 46.8436,-114.7933 51.7933,-109.8436\"/>\n",
       "</g>\n",
       "<!-- B -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>B</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"99\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">B</text>\n",
       "</g>\n",
       "<!-- C&#45;&gt;B -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>C&#45;&gt;B</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M99,-143.8314C99,-136.131 99,-126.9743 99,-118.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"102.5001,-118.4132 99,-108.4133 95.5001,-118.4133 102.5001,-118.4132\"/>\n",
       "</g>\n",
       "<!-- G -->\n",
       "<g id=\"node7\" class=\"node\">\n",
       "<title>G</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">G</text>\n",
       "</g>\n",
       "<!-- A&#45;&gt;G -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>A&#45;&gt;G</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M27,-71.8314C27,-64.131 27,-54.9743 27,-46.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"30.5001,-46.4132 27,-36.4133 23.5001,-46.4133 30.5001,-46.4132\"/>\n",
       "</g>\n",
       "<!-- E -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>E</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"135\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"135\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">E</text>\n",
       "</g>\n",
       "<!-- B&#45;&gt;E -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>B&#45;&gt;E</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M107.7146,-72.5708C111.9597,-64.0807 117.1536,-53.6929 121.8663,-44.2674\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"125.024,-45.7782 126.3657,-35.2687 118.763,-42.6477 125.024,-45.7782\"/>\n",
       "</g>\n",
       "<!-- D -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>D</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">D</text>\n",
       "</g>\n",
       "<!-- D&#45;&gt;A -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>D&#45;&gt;A</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M27,-143.8314C27,-136.131 27,-126.9743 27,-118.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"30.5001,-118.4132 27,-108.4133 23.5001,-118.4133 30.5001,-118.4132\"/>\n",
       "</g>\n",
       "<!-- F -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>F</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"171\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"171\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">F</text>\n",
       "</g>\n",
       "<!-- F&#45;&gt;E -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>F&#45;&gt;E</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M162.2854,-72.5708C158.0403,-64.0807 152.8464,-53.6929 148.1337,-44.2674\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"151.237,-42.6477 143.6343,-35.2687 144.976,-45.7782 151.237,-42.6477\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x11dedcef0>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"C\", \"A\")\n",
    "g.edge(\"C\", \"B\")\n",
    "g.edge(\"D\", \"A\")\n",
    "g.edge(\"B\", \"E\")\n",
    "g.edge(\"F\", \"E\")\n",
    "g.edge(\"A\", \"G\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Answers**:\n",
    "1. \\\\(D \\!\\perp\\!\\!\\!\\perp C\\\\). It contains a collider that it has **not** been conditioned on.\n",
    "2. \\\\(D \\not\\!\\perp\\!\\!\\!\\perp C| A \\\\). It contains a collider that it has  been conditioned on.\n",
    "3. \\\\(D \\not\\!\\perp\\!\\!\\!\\perp C| G \\\\). It contains the descendent of a collider that has  been conditioned on. You can see G as some kind of proxy for A here.\n",
    "4. \\\\(A \\!\\perp\\!\\!\\!\\perp F \\\\). It contains a collider, B->E<-F, that it has **not** been conditioned on.\n",
    "5. \\\\(A \\not\\!\\perp\\!\\!\\!\\perp F|E \\\\). It contains a collider, B->E<-F, that it has been conditioned on.\n",
    "6. \\\\(A \\!\\perp\\!\\!\\!\\perp F|E, C \\\\). It contains a collider, B->E<-F, that it has been conditioned on, but it contains a non collider that has been conditioned on. Conditioning on E opens the path, but conditioning on C closes it again.\n",
    "\n",
    "Knowing about causal graphical models enables us to understand the problems that arise in causal inference. As we've seen, the problem always boils down to bias. \n",
    "\n",
    "$\n",
    "E[Y|T=1] - E[Y|T=0] = \\underbrace{E[Y_1 - Y_0|T=1]}_{ATET} + \\underbrace{\\{ E[Y_0|T=1] - E[Y_0|T=0] \\}}_{BIAS}\n",
    "$\n",
    "\n",
    "Graphical models allow us to diagnose which bias we are dealing with and what are the tools we need to correct for them.\n",
    "\n",
    "## Confounding Bias\n",
    "\n",
    "![img](./data/img/causal-graph/both_crap.png)\n",
    "\n",
    "The first big cause of bias is confounding. It happens when the treatment and the outcome shares a common cause. For example, let's say that the treatment is education and the outcome is income. It is hard to know the causal effect of education on the wage because both share a common cause: intelligence. So we could make the argument that more educated people earn more money simply because they are more intelligent, not because they have more education. In order to identify the causal effect, we need to close all backdoor paths between the treatment and the outcome. If we do so, the only effect that will be left is the direct effect T->Y. In our example, if we control for intelligence, that is, we compare people with the same level of intelligence but different levels of education, the difference in the outcome will be only due to the difference in education, since intelligence will be the same for everyone. In order to fix confounding bias, we need to control all common causes of the treatment and the outcome."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"229pt\" height=\"188pt\"\n",
       " viewBox=\"0.00 0.00 229.39 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-184 225.3886,-184 225.3886,4 -4,4\"/>\n",
       "<!-- X -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>X</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"70\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"70\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X</text>\n",
       "</g>\n",
       "<!-- T -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>T</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">T</text>\n",
       "</g>\n",
       "<!-- X&#45;&gt;T -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>X&#45;&gt;T</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M59.8096,-144.937C54.6151,-136.2393 48.1894,-125.4799 42.4052,-115.7948\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"45.2496,-113.7314 37.1173,-106.9405 39.2398,-117.3206 45.2496,-113.7314\"/>\n",
       "</g>\n",
       "<!-- Y -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>Y</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"45\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"45\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Y</text>\n",
       "</g>\n",
       "<!-- X&#45;&gt;Y -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>X&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M69.8836,-143.892C69.4649,-125.5892 67.974,-96.5772 63,-72 61.2162,-63.186 58.4261,-53.8148 55.5706,-45.3964\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"58.7702,-43.948 52.1182,-35.7017 52.1758,-46.2964 58.7702,-43.948\"/>\n",
       "</g>\n",
       "<!-- T&#45;&gt;Y -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>T&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M31.4494,-72.2022C33.4398,-64.2406 35.8332,-54.6671 38.0511,-45.7957\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"41.5094,-46.3929 40.5394,-35.8425 34.7184,-44.6951 41.5094,-46.3929\"/>\n",
       "</g>\n",
       "<!-- Inteligence -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>Inteligence</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"172\" cy=\"-162\" rx=\"49.2774\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"172\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Inteligence</text>\n",
       "</g>\n",
       "<!-- Educ -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>Educ</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"149\" cy=\"-90\" rx=\"28.0565\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"149\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Educ</text>\n",
       "</g>\n",
       "<!-- Inteligence&#45;&gt;Educ -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>Inteligence&#45;&gt;Educ</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M166.1961,-143.8314C163.6435,-135.8406 160.59,-126.2819 157.7692,-117.4514\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"161.0994,-116.3741 154.7223,-107.9134 154.4313,-118.5043 161.0994,-116.3741\"/>\n",
       "</g>\n",
       "<!-- Wage -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>Wage</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"177\" cy=\"-18\" rx=\"30.0438\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"177\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Wage</text>\n",
       "</g>\n",
       "<!-- Inteligence&#45;&gt;Wage -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>Inteligence&#45;&gt;Wage</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M177.9698,-143.7319C181.0408,-133.3756 184.4563,-120.105 186,-108 188.6444,-87.2634 186.1047,-63.7558 183.0575,-45.9668\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"186.4654,-45.1494 181.1885,-35.9622 179.5845,-46.4349 186.4654,-45.1494\"/>\n",
       "</g>\n",
       "<!-- Educ&#45;&gt;Wage -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>Educ&#45;&gt;Wage</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M155.9214,-72.2022C159.1086,-64.0064 162.9602,-54.1024 166.4943,-45.0145\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"169.7958,-46.1817 170.1583,-35.593 163.2717,-43.6445 169.7958,-46.1817\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x11deda2b0>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"X\", \"T\")\n",
    "g.edge(\"X\", \"Y\")\n",
    "g.edge(\"T\", \"Y\")\n",
    "\n",
    "g.edge(\"Inteligence\", \"Educ\"),\n",
    "g.edge(\"Inteligence\", \"Wage\"),\n",
    "g.edge(\"Educ\", \"Wage\")\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Unfortunately, it is not always possible to control for all common causes. Sometimes, there are unknown causes or known causes that we can't measure. The case of intelligence is one of the latter. Despite all the effort, scientists haven't yet figured out how to measure intelligence well. I'll use U to denote unmeasured variables here. Now, assume for a moment that intelligence can't affect your education directly. It just affects how well you do on the SATs, but it is the SATs that determine your level of education, since it opens the possibility of a good college. Even if we can't control for the unmeasurable intelligence, we  can control for SAT and close that backdoor path."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"439pt\" height=\"260pt\"\n",
       " viewBox=\"0.00 0.00 439.39 260.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 256)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-256 435.3886,-256 435.3886,4 -4,4\"/>\n",
       "<!-- X1 -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>X1</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X1</text>\n",
       "</g>\n",
       "<!-- T -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>T</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"95\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"95\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">T</text>\n",
       "</g>\n",
       "<!-- X1&#45;&gt;T -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>X1&#45;&gt;T</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M41.4211,-146.7307C50.7972,-136.803 63.1867,-123.6847 73.6899,-112.5637\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"76.278,-114.9207 80.5997,-105.2473 71.1889,-110.1143 76.278,-114.9207\"/>\n",
       "</g>\n",
       "<!-- Y -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>Y</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"95\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"95\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Y</text>\n",
       "</g>\n",
       "<!-- X1&#45;&gt;Y -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>X1&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M31.9041,-144.2215C37.342,-125.6765 46.9578,-96.0147 59,-72 64.0001,-62.0288 70.5168,-51.6939 76.6293,-42.7678\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"79.6417,-44.5689 82.536,-34.377 73.9177,-40.5395 79.6417,-44.5689\"/>\n",
       "</g>\n",
       "<!-- T&#45;&gt;Y -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>T&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M95,-71.8314C95,-64.131 95,-54.9743 95,-46.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"98.5001,-46.4132 95,-36.4133 91.5001,-46.4133 98.5001,-46.4132\"/>\n",
       "</g>\n",
       "<!-- X2 -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>X2</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"99\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X2</text>\n",
       "</g>\n",
       "<!-- X2&#45;&gt;T -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>X2&#45;&gt;T</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M97.9906,-143.8314C97.5628,-136.131 97.0541,-126.9743 96.5787,-118.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"100.0724,-118.2037 96.023,-108.4133 93.0831,-118.592 100.0724,-118.2037\"/>\n",
       "</g>\n",
       "<!-- U -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>U</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"127\" cy=\"-234\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"127\" y=\"-229.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">U</text>\n",
       "</g>\n",
       "<!-- U&#45;&gt;Y -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>U&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M131.3409,-215.9151C137.8339,-185.3453 147.6367,-121.91 131,-72 127.4726,-61.4177 121.2896,-50.9444 115.0324,-42.0784\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"117.6371,-39.7171 108.8461,-33.8036 112.0306,-43.9085 117.6371,-39.7171\"/>\n",
       "</g>\n",
       "<!-- U&#45;&gt;X2 -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>U&#45;&gt;X2</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M120.222,-216.5708C117.0128,-208.3187 113.1065,-198.2738 109.5242,-189.0623\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"112.6992,-187.5697 105.8126,-179.5182 106.1752,-190.1069 112.6992,-187.5697\"/>\n",
       "</g>\n",
       "<!-- Family Income -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>Family Income</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"246\" cy=\"-162\" rx=\"63.7863\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"246\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Family Income</text>\n",
       "</g>\n",
       "<!-- Educ -->\n",
       "<g id=\"node7\" class=\"node\">\n",
       "<title>Educ</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"341\" cy=\"-90\" rx=\"28.0565\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"341\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Educ</text>\n",
       "</g>\n",
       "<!-- Family Income&#45;&gt;Educ -->\n",
       "<g id=\"edge7\" class=\"edge\">\n",
       "<title>Family Income&#45;&gt;Educ</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M268.5137,-144.937C282.3655,-134.4388 300.1803,-120.937 314.6824,-109.946\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"316.9275,-112.6361 322.7832,-103.8064 312.6994,-107.0573 316.9275,-112.6361\"/>\n",
       "</g>\n",
       "<!-- Wage -->\n",
       "<g id=\"node8\" class=\"node\">\n",
       "<title>Wage</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"341\" cy=\"-18\" rx=\"30.0438\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"341\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Wage</text>\n",
       "</g>\n",
       "<!-- Family Income&#45;&gt;Wage -->\n",
       "<g id=\"edge10\" class=\"edge\">\n",
       "<title>Family Income&#45;&gt;Wage</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M257.4064,-143.9732C268.9968,-125.7337 287.5373,-96.7699 304,-72 310.2276,-62.6298 317.172,-52.4311 323.3474,-43.446\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"326.4164,-45.1607 329.2114,-34.9411 320.6535,-41.1872 326.4164,-45.1607\"/>\n",
       "</g>\n",
       "<!-- Educ&#45;&gt;Wage -->\n",
       "<g id=\"edge8\" class=\"edge\">\n",
       "<title>Educ&#45;&gt;Wage</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M341,-71.8314C341,-64.131 341,-54.9743 341,-46.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"344.5001,-46.4132 341,-36.4133 337.5001,-46.4133 344.5001,-46.4132\"/>\n",
       "</g>\n",
       "<!-- SAT -->\n",
       "<g id=\"node9\" class=\"node\">\n",
       "<title>SAT</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"355\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"355\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">SAT</text>\n",
       "</g>\n",
       "<!-- SAT&#45;&gt;Educ -->\n",
       "<g id=\"edge9\" class=\"edge\">\n",
       "<title>SAT&#45;&gt;Educ</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M351.4672,-143.8314C349.9302,-135.9266 348.0947,-126.4872 346.3932,-117.7365\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"349.8276,-117.0615 344.4832,-107.9134 342.9563,-118.3976 349.8276,-117.0615\"/>\n",
       "</g>\n",
       "<!-- Inteligence -->\n",
       "<g id=\"node10\" class=\"node\">\n",
       "<title>Inteligence</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"382\" cy=\"-234\" rx=\"49.2774\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"382\" y=\"-229.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Inteligence</text>\n",
       "</g>\n",
       "<!-- Inteligence&#45;&gt;Wage -->\n",
       "<g id=\"edge12\" class=\"edge\">\n",
       "<title>Inteligence&#45;&gt;Wage</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M385.8413,-215.6001C387.8166,-205.2025 390.0123,-191.9346 391,-180 394.9876,-131.8182 396.3661,-116.7221 378,-72 373.7907,-61.7502 367.4207,-51.475 361.1464,-42.6804\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"363.7762,-40.3524 354.9877,-34.4353 358.168,-44.5415 363.7762,-40.3524\"/>\n",
       "</g>\n",
       "<!-- Inteligence&#45;&gt;SAT -->\n",
       "<g id=\"edge11\" class=\"edge\">\n",
       "<title>Inteligence&#45;&gt;SAT</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M375.1868,-215.8314C372.1902,-207.8406 368.6057,-198.2819 365.2943,-189.4514\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"368.506,-188.0478 361.7175,-179.9134 361.9517,-190.5057 368.506,-188.0478\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x11dedad68>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"X1\", \"T\")\n",
    "g.edge(\"T\", \"Y\")\n",
    "g.edge(\"X2\", \"T\")\n",
    "g.edge(\"X1\", \"Y\")\n",
    "g.edge(\"U\", \"X2\")\n",
    "g.edge(\"U\", \"Y\")\n",
    "\n",
    "g.edge(\"Family Income\", \"Educ\")\n",
    "g.edge(\"Educ\", \"Wage\")\n",
    "g.edge(\"SAT\", \"Educ\")\n",
    "g.edge(\"Family Income\", \"Wage\")\n",
    "g.edge(\"Inteligence\", \"SAT\")\n",
    "g.edge(\"Inteligence\", \"Wage\")\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the following graph, conditioning on X1 and X2, or, SAT and family income, is sufficient to close all backdoor paths between the treatment and the outcome. In other words, \\\\((Y_0, Y_1) \\perp T | X1, X2\\\\). So even if we can't measure all common causes, we can still attain conditional independence if we control for measurable variables that mediate the effect of the unmeasured on the treatment.\n",
    "\n",
    "But what if that is not the case? What if the unmeasured variable causes the treatment and the outcome directly? In the following example, intelligence causes education and income directly. So there is confounding on the relationship between the treatment education and the outcome wage. In this case, we can't control the confounder, because it is unmeasurable. However, we have other measured variables that can act as a proxy for the confounder. Those variables are not in the backdoor path, but controlling for them will help lower the bias (but it won't eliminate it). Those variables are sometimes referred to as surrogate confounders.\n",
    "\n",
    "In our example, we can't measure intelligence, but we can measure some causes of it, like father education and mother education, and some effects of it, like IQ or SAT score. Controlling for those surrogate variables is not sufficient to eliminate bias, but it sure helps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"392pt\" height=\"260pt\"\n",
       " viewBox=\"0.00 0.00 391.52 260.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 256)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-256 387.5221,-256 387.5221,4 -4,4\"/>\n",
       "<!-- X -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>X</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"55\" cy=\"-234\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"55\" y=\"-229.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X</text>\n",
       "</g>\n",
       "<!-- U -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>U</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"55\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"55\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">U</text>\n",
       "</g>\n",
       "<!-- X&#45;&gt;U -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>X&#45;&gt;U</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M55,-215.8314C55,-208.131 55,-198.9743 55,-190.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"58.5001,-190.4132 55,-180.4133 51.5001,-190.4133 58.5001,-190.4132\"/>\n",
       "</g>\n",
       "<!-- T -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>T</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">T</text>\n",
       "</g>\n",
       "<!-- U&#45;&gt;T -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>U&#45;&gt;T</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M48.222,-144.5708C45.0128,-136.3187 41.1065,-126.2738 37.5242,-117.0623\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"40.6992,-115.5697 33.8126,-107.5182 34.1752,-118.1069 40.6992,-115.5697\"/>\n",
       "</g>\n",
       "<!-- Y -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>Y</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"55\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"55\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Y</text>\n",
       "</g>\n",
       "<!-- U&#45;&gt;Y -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>U&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M58.3315,-144.0736C60.1087,-133.5982 62.1087,-120.0982 63,-108 64.1756,-92.0432 64.1756,-87.9568 63,-72 62.3733,-63.4935 61.1985,-54.2939 59.9384,-45.9399\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"63.3718,-45.2455 58.3315,-35.9264 56.4602,-46.3546 63.3718,-45.2455\"/>\n",
       "</g>\n",
       "<!-- T&#45;&gt;Y -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>T&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M33.778,-72.5708C36.9872,-64.3187 40.8935,-54.2738 44.4758,-45.0623\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"47.8248,-46.1069 48.1874,-35.5182 41.3008,-43.5697 47.8248,-46.1069\"/>\n",
       "</g>\n",
       "<!-- Inteligence -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>Inteligence</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"245\" cy=\"-162\" rx=\"49.2774\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"245\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Inteligence</text>\n",
       "</g>\n",
       "<!-- QI -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>QI</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"137\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"137\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">QI</text>\n",
       "</g>\n",
       "<!-- Inteligence&#45;&gt;QI -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>Inteligence&#45;&gt;QI</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M221.2989,-146.1993C204.6959,-135.1306 182.4427,-120.2952 164.9694,-108.6462\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"166.5848,-105.5168 156.3228,-102.8819 162.7019,-111.3411 166.5848,-105.5168\"/>\n",
       "</g>\n",
       "<!-- SAT -->\n",
       "<g id=\"node7\" class=\"node\">\n",
       "<title>SAT</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"209\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"209\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">SAT</text>\n",
       "</g>\n",
       "<!-- Inteligence&#45;&gt;SAT -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>Inteligence&#45;&gt;SAT</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M236.1011,-144.2022C231.8978,-135.7955 226.7959,-125.5917 222.1575,-116.3149\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"225.1513,-114.4762 217.5486,-107.0972 218.8903,-117.6068 225.1513,-114.4762\"/>\n",
       "</g>\n",
       "<!-- Educ -->\n",
       "<g id=\"node10\" class=\"node\">\n",
       "<title>Educ</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"282\" cy=\"-90\" rx=\"28.0565\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"282\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Educ</text>\n",
       "</g>\n",
       "<!-- Inteligence&#45;&gt;Educ -->\n",
       "<g id=\"edge9\" class=\"edge\">\n",
       "<title>Inteligence&#45;&gt;Educ</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M254.1461,-144.2022C258.4344,-135.8574 263.6327,-125.7417 268.3718,-116.5197\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"271.6291,-117.8387 273.0868,-107.3446 265.4031,-114.6391 271.6291,-117.8387\"/>\n",
       "</g>\n",
       "<!-- Wage -->\n",
       "<g id=\"node11\" class=\"node\">\n",
       "<title>Wage</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"310\" cy=\"-18\" rx=\"30.0438\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"310\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Wage</text>\n",
       "</g>\n",
       "<!-- Inteligence&#45;&gt;Wage -->\n",
       "<g id=\"edge11\" class=\"edge\">\n",
       "<title>Inteligence&#45;&gt;Wage</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M276.225,-147.9734C292.139,-139.0127 309.993,-125.7228 319,-108 328.8446,-88.6291 325.4171,-63.9039 320.1891,-45.2723\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"323.4964,-44.1214 317.1723,-35.6211 316.8152,-46.2099 323.4964,-44.1214\"/>\n",
       "</g>\n",
       "<!-- Father Educ -->\n",
       "<g id=\"node8\" class=\"node\">\n",
       "<title>Father Educ</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"199\" cy=\"-234\" rx=\"53.1612\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"199\" y=\"-229.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Father Educ</text>\n",
       "</g>\n",
       "<!-- Father Educ&#45;&gt;Inteligence -->\n",
       "<g id=\"edge7\" class=\"edge\">\n",
       "<title>Father Educ&#45;&gt;Inteligence</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M210.3708,-216.2022C215.7742,-207.7448 222.3396,-197.4685 228.295,-188.147\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"231.3255,-189.9044 233.76,-179.593 225.4266,-186.1356 231.3255,-189.9044\"/>\n",
       "</g>\n",
       "<!-- Mother Educ -->\n",
       "<g id=\"node9\" class=\"node\">\n",
       "<title>Mother Educ</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"327\" cy=\"-234\" rx=\"56.5441\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"327\" y=\"-229.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Mother Educ</text>\n",
       "</g>\n",
       "<!-- Mother Educ&#45;&gt;Inteligence -->\n",
       "<g id=\"edge8\" class=\"edge\">\n",
       "<title>Mother Educ&#45;&gt;Inteligence</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M307.5671,-216.937C296.8696,-207.5441 283.4337,-195.7467 271.7576,-185.4944\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"273.8342,-182.6601 264.0105,-178.6921 269.2156,-187.9202 273.8342,-182.6601\"/>\n",
       "</g>\n",
       "<!-- Educ&#45;&gt;Wage -->\n",
       "<g id=\"edge10\" class=\"edge\">\n",
       "<title>Educ&#45;&gt;Wage</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M288.9214,-72.2022C292.1086,-64.0064 295.9602,-54.1024 299.4943,-45.0145\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"302.7958,-46.1817 303.1583,-35.593 296.2717,-43.6445 302.7958,-46.1817\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x11dedab70>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"X\", \"U\")\n",
    "g.edge(\"U\", \"T\")\n",
    "g.edge(\"T\", \"Y\")\n",
    "g.edge(\"U\", \"Y\")\n",
    "\n",
    "g.edge(\"Inteligence\", \"QI\")\n",
    "g.edge(\"Inteligence\", \"SAT\")\n",
    "g.edge(\"Father Educ\", \"Inteligence\")\n",
    "g.edge(\"Mother Educ\", \"Inteligence\")\n",
    "\n",
    "g.edge(\"Inteligence\", \"Educ\")\n",
    "g.edge(\"Educ\", \"Wage\")\n",
    "g.edge(\"Inteligence\", \"Wage\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Selection Bias\n",
    "\n",
    "You might think that it is a good idea to add everything you can measure to your model just to be sure you don't have confounding bias. Well, think again.\n",
    "\n",
    "![image.png](./data/img/causal-graph/selection_bias.png)\n",
    "\n",
    "The second big source of bias is what we will call selection bias. If confounding bias happens when we don't control for a common cause, selection bias is more related to effects. One word of caution here, economists tend to refer to all sorts of bias as selection bias. Here, I think the distinction between it and confounding bias is very helpful, so I'll stick to it. \n",
    "\n",
    "More often than not, selection bias arises when we control for more variables that we should. It might be the case that treatment and the potential outcome are marginally independent, but become dependent once we condition on a collider. \n",
    "\n",
    "Imagine that with the help of some miracle you are finally able to randomize education in order to measure its effect on wage. But just to be sure you won't have confounding, you control for a lot of variables. Among them, you control for investments. But investment is not a common cause of education and wage. Instead, it is a consequence of both. More educated people both earn more and invest more. Also, those who earn more invest more. Since investment is a collider, by conditioning on it, you are opening a second path between the treatment and the outcome, which will make it harder to measure the direct effect. One way to think about this is that by controlling investments, you are looking at small groups of the population where investment is the same and then finding the effect of education on those groups. But by doing so, you are also indirectly and inadvertently not allowing wages to change much. As a result, you won't be able to see how education changes wage, because you are not allowing wages to change as it should. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"216pt\" height=\"188pt\"\n",
       " viewBox=\"0.00 0.00 216.02 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-184 212.0219,-184 212.0219,4 -4,4\"/>\n",
       "<!-- T -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>T</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"38\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"38\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">T</text>\n",
       "</g>\n",
       "<!-- X -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>X</title>\n",
       "<ellipse fill=\"none\" stroke=\"#ff0000\" cx=\"27\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X</text>\n",
       "</g>\n",
       "<!-- T&#45;&gt;X -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>T&#45;&gt;X</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M27.0915,-145.2705C21.0443,-134.9795 14.1306,-121.2615 11,-108 6.0392,-86.9857 10.8706,-62.8873 16.4871,-44.9185\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"19.8277,-45.9656 19.7233,-35.3712 13.1982,-43.7183 19.8277,-45.9656\"/>\n",
       "</g>\n",
       "<!-- Y -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>Y</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"47\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"47\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Y</text>\n",
       "</g>\n",
       "<!-- T&#45;&gt;Y -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>T&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M40.2711,-143.8314C41.2336,-136.131 42.3782,-126.9743 43.4479,-118.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"46.9309,-118.7702 44.6983,-108.4133 39.9849,-117.9019 46.9309,-118.7702\"/>\n",
       "</g>\n",
       "<!-- Y&#45;&gt;X -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>Y&#45;&gt;X</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M42.0562,-72.2022C39.8206,-64.1541 37.1274,-54.4588 34.6407,-45.5067\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"38.0051,-44.541 31.9563,-35.8425 31.2605,-46.4145 38.0051,-44.541\"/>\n",
       "</g>\n",
       "<!-- Educ -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>Educ</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"129\" cy=\"-162\" rx=\"28.0565\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"129\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Educ</text>\n",
       "</g>\n",
       "<!-- Investments -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>Investments</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"140\" cy=\"-18\" rx=\"52.6865\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"140\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Investments</text>\n",
       "</g>\n",
       "<!-- Educ&#45;&gt;Investments -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>Educ&#45;&gt;Investments</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M130.3932,-143.7623C132.2694,-119.201 135.6269,-75.2474 137.8341,-46.3541\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"141.3462,-46.3272 138.6182,-36.0896 134.3666,-45.794 141.3462,-46.3272\"/>\n",
       "</g>\n",
       "<!-- Wage -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>Wage</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"178\" cy=\"-90\" rx=\"30.0438\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"178\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Wage</text>\n",
       "</g>\n",
       "<!-- Educ&#45;&gt;Wage -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>Educ&#45;&gt;Wage</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M140.3647,-145.3008C146.3999,-136.4329 153.9447,-125.3465 160.6913,-115.4332\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"163.6238,-117.345 166.3566,-107.1086 157.8368,-113.4066 163.6238,-117.345\"/>\n",
       "</g>\n",
       "<!-- Wage&#45;&gt;Investments -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>Wage&#45;&gt;Investments</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M168.8013,-72.5708C164.3862,-64.2055 158.9989,-53.998 154.0831,-44.6839\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"157.141,-42.9791 149.378,-35.7689 150.9503,-46.2464 157.141,-42.9791\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x11deda828>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"T\", \"X\")\n",
    "g.edge(\"T\", \"Y\")\n",
    "g.edge(\"Y\", \"X\")\n",
    "g.node(\"X\", \"X\", color=\"red\")\n",
    "\n",
    "g.edge(\"Educ\", \"Investments\")\n",
    "g.edge(\"Educ\", \"Wage\")\n",
    "g.edge(\"Wage\", \"Investments\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To demonstrate why this is the case, imagine that investments and education takes only 2 values. Either people invest or not. They are either educated or not. Initially, when we don't control for investments, the bias term is zero \\\\(E[Y_0|T=1] - E[Y_0|T=0] = 0\\\\) because the education was randomised. This means that the wage people would have in the case they didn't receive education \\\\(Wage_0\\\\) is the same if they do or don't receive the education treatment. But what happens if we condition on investments?\n",
    "\n",
    "Looking at those that invest, we probably have the case that \\\\(E[Y_0|T=0, I=1] > E[Y_0|T=1, I=1]\\\\). In words, among those that invest, those that manage to do so even without education are more independent of education to achieve high earnings. For this reason, the wage those people have, \\\\(Wage_0|T=0\\\\), is probably higher than the wage the educated group would have in the case that they didn't had education, \\\\(Wage_0|T=1\\\\). A similar reasoning can be applied to those that don't invest, where we also probably have \\\\(E[Y_0|T=0, I=0] > E[Y_0|T=1, I=0]\\\\). Those that don't invest even with education, probably would have a lower wage, had they not got the education, than those that didn't invest but also didn't have education. \n",
    "\n",
    "To use a purely graphical argument, if someone invests, knowing that they have high education explains away the second cause which is wage. Conditioned on investing, higher education is associated with low wages and we have a negative bias \\\\(E[Y_0|T=0, I=i] > E[Y_0|T=1, I=i]\\\\). \n",
    "\n",
    "Just as a side note, all of this we've discussed is also true if we condition on any descendent of a common effect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"90pt\" height=\"260pt\"\n",
       " viewBox=\"0.00 0.00 90.00 260.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 256)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-256 86,-256 86,4 -4,4\"/>\n",
       "<!-- T -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>T</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-234\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-229.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">T</text>\n",
       "</g>\n",
       "<!-- X -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>X</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X</text>\n",
       "</g>\n",
       "<!-- T&#45;&gt;X -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>T&#45;&gt;X</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M23.6685,-216.0736C21.8913,-205.5982 19.8913,-192.0982 19,-180 17.8244,-164.0432 17.8244,-159.9568 19,-144 19.6267,-135.4935 20.8015,-126.2939 22.0616,-117.9399\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"25.5398,-118.3546 23.6685,-107.9264 18.6282,-117.2455 25.5398,-118.3546\"/>\n",
       "</g>\n",
       "<!-- Y -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>Y</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"55\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"55\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Y</text>\n",
       "</g>\n",
       "<!-- T&#45;&gt;Y -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>T&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M33.778,-216.5708C36.9872,-208.3187 40.8935,-198.2738 44.4758,-189.0623\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"47.8248,-190.1069 48.1874,-179.5182 41.3008,-187.5697 47.8248,-190.1069\"/>\n",
       "</g>\n",
       "<!-- S -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>S</title>\n",
       "<ellipse fill=\"none\" stroke=\"#ff0000\" cx=\"27\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">S</text>\n",
       "</g>\n",
       "<!-- X&#45;&gt;S -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>X&#45;&gt;S</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M27,-71.8314C27,-64.131 27,-54.9743 27,-46.4166\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"30.5001,-46.4132 27,-36.4133 23.5001,-46.4133 30.5001,-46.4132\"/>\n",
       "</g>\n",
       "<!-- Y&#45;&gt;X -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>Y&#45;&gt;X</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M48.222,-144.5708C45.0128,-136.3187 41.1065,-126.2738 37.5242,-117.0623\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"40.6992,-115.5697 33.8126,-107.5182 34.1752,-118.1069 40.6992,-115.5697\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x11deda438>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"T\", \"X\")\n",
    "g.edge(\"T\", \"Y\")\n",
    "g.edge(\"Y\", \"X\")\n",
    "g.edge(\"X\", \"S\")\n",
    "g.node(\"S\", \"S\", color=\"red\")\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A similar thing happens when we condition on a mediator of the treatment. A mediator is a variable between the treatment and the outcome. It, well, mediates the causal effect. For example, suppose again you are able to randomize education. But, just to be sure, you decide to control whether or not the person had a white collar job. Once again, this conditioning biasses the causal effect estimation. This time, not because it opens a front door path with a collider, but because it closes one of the channels through which the treatment operates. In our example, getting a white collar job is one way that more education leads to higher pay. By controlling it, we close this channel and leave open only the direct effect of education on wage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"239pt\" height=\"188pt\"\n",
       " viewBox=\"0.00 0.00 238.77 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-184 234.7653,-184 234.7653,4 -4,4\"/>\n",
       "<!-- T -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>T</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"55\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"55\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">T</text>\n",
       "</g>\n",
       "<!-- X -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>X</title>\n",
       "<ellipse fill=\"none\" stroke=\"#ff0000\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">X</text>\n",
       "</g>\n",
       "<!-- T&#45;&gt;X -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>T&#45;&gt;X</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M48.222,-144.5708C45.0128,-136.3187 41.1065,-126.2738 37.5242,-117.0623\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"40.6992,-115.5697 33.8126,-107.5182 34.1752,-118.1069 40.6992,-115.5697\"/>\n",
       "</g>\n",
       "<!-- Y -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>Y</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"55\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"55\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Y</text>\n",
       "</g>\n",
       "<!-- T&#45;&gt;Y -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>T&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M58.3315,-144.0736C60.1087,-133.5982 62.1087,-120.0982 63,-108 64.1756,-92.0432 64.1756,-87.9568 63,-72 62.3733,-63.4935 61.1985,-54.2939 59.9384,-45.9399\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"63.3718,-45.2455 58.3315,-35.9264 56.4602,-46.3546 63.3718,-45.2455\"/>\n",
       "</g>\n",
       "<!-- X&#45;&gt;Y -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>X&#45;&gt;Y</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M33.778,-72.5708C36.9872,-64.3187 40.8935,-54.2738 44.4758,-45.0623\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"47.8248,-46.1069 48.1874,-35.5182 41.3008,-43.5697 47.8248,-46.1069\"/>\n",
       "</g>\n",
       "<!-- educ -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>educ</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"202\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"202\" y=\"-157.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">educ</text>\n",
       "</g>\n",
       "<!-- white collar -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>white collar</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"162\" cy=\"-90\" rx=\"52.1922\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"162\" y=\"-85.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">white collar</text>\n",
       "</g>\n",
       "<!-- educ&#45;&gt;white collar -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>educ&#45;&gt;white collar</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M192.5206,-144.937C187.8423,-136.5161 182.0904,-126.1627 176.8459,-116.7226\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"179.7452,-114.7342 171.8291,-107.6924 173.626,-118.1338 179.7452,-114.7342\"/>\n",
       "</g>\n",
       "<!-- wage -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>wage</title>\n",
       "<ellipse fill=\"none\" stroke=\"#000000\" cx=\"202\" cy=\"-18\" rx=\"28.5325\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"202\" y=\"-13.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">wage</text>\n",
       "</g>\n",
       "<!-- educ&#45;&gt;wage -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>educ&#45;&gt;wage</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M210.7195,-144.491C215.376,-134.1547 220.626,-120.6547 223,-108 225.9502,-92.2743 225.9502,-87.7257 223,-72 221.2937,-62.9044 218.1016,-53.3722 214.712,-44.8918\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"217.8555,-43.3401 210.7195,-35.509 211.4144,-46.081 217.8555,-43.3401\"/>\n",
       "</g>\n",
       "<!-- white collar&#45;&gt;wage -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>white collar&#45;&gt;wage</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M171.8876,-72.2022C176.6072,-63.7071 182.3463,-53.3767 187.5432,-44.0223\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"190.7046,-45.5386 192.5015,-35.0972 184.5855,-42.139 190.7046,-45.5386\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x11dedae80>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"T\", \"X\")\n",
    "g.edge(\"T\", \"Y\")\n",
    "g.edge(\"X\", \"Y\")\n",
    "g.node(\"X\", \"X\", color=\"red\")\n",
    "\n",
    "g.edge(\"educ\", \"white collar\")\n",
    "g.edge(\"educ\", \"wage\")\n",
    "g.edge(\"white collar\", \"wage\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To give a potential outcome argument, we know that, due to randomisation, the bias is zero \\\\(E[Y_0|T=0] - E[Y_0|T=1] = 0\\\\). However, if we condition on the white collar individuals, we have that \\\\(E[Y_0|T=0, WC=1] > E[Y_0|T=1, WC=1]\\\\). That is because those that manage to get a white collar job even without education are probably more hard working than those that required the help of education to get the same job. With the same reasoning, \\\\(E[Y_0|T=0, WC=0] > E[Y_0|T=1, WC=0]\\\\) because those that didn't get a white collar job even with education are probably less hard working than those that didn't, but also didn't have any education. \n",
    "\n",
    "In our case, conditioning on the mediator induces a negative bias. It makes the effect of education seem lower than it actually is. This is the case because the causal effect is positive. If the effect were negative, conditioning on a mediator would have a positive bias. In all cases, this sort of conditioning makes the effect look weaker than it is. \n",
    "\n",
    "To put it in a more prosaic way, suppose that you have to choose between two candidates for a job at your company. Both have equally impressive professional achievements, but one does not have a higher education degree. Which one should you choose? Of course, you should go with the one without the higher education, because he managed to achieve the same things as the other one but had the odds stacked against him.\n",
    "\n",
    "![image.png](./data/img/causal-graph/three_bias.png)\n",
    "\n",
    "## Key Ideas\n",
    "\n",
    "We've studied graphical models as a language to better understand and express causality ideas. We did a quick summary on the rules of conditional independence on a graph. This helped us then explore three structures that can lead to bias.\n",
    "\n",
    "The first one was confounding, which happens when treatment and outcome have a common cause that we don't account or control for. The second is selection bias due to conditioning on a common effect. This excessive controlling can lead to bias even if the treatment was randomly assigned. The third structure is also a form of selection bias, this time due to excessive controlling of mediator variables. Selection bias can often be fixed by simply doing nothing, which is why it is so dangerous. Since we are biased to action, we tend to see ideas that control for things as clever, when they can be doing more harm than good. \n",
    "\n",
    "## References\n",
    "\n",
    "I like to think of this entire book as a tribute to Joshua Angrist, Alberto Abadie and Christopher Walters for their amazing Econometrics class. Most of the ideas here are taken from their classes at the American Economic Association. Watching them is what is keeping me sane during this tough year of 2020.\n",
    "* [Cross-Section Econometrics](https://www.aeaweb.org/conference/cont-ed/2017-webcasts)\n",
    "* [Mastering Mostly Harmless Econometrics](https://www.aeaweb.org/conference/cont-ed/2020-webcasts)\n",
    "\n",
    "I'll also like to reference the amazing books from Angrist. They have shown me that Econometrics, or 'Metrics as they call it, is not only extremely useful but also profoundly fun.\n",
    "\n",
    "* [Mostly Harmless Econometrics](https://www.mostlyharmlesseconometrics.com/)\n",
    "* [Mastering 'Metrics](https://www.masteringmetrics.com/)\n",
    "\n",
    "My final reference is Miguel Hernan and Jamie Robins' book. It has been my trustworthy companion in the most thorny causal questions I had to answer.\n",
    "\n",
    "* [Causal Inference Book](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/)\n",
    "\n",
    "\n",
    "![img](./data/img/poetry.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
